regularization method
Regularizing Attention Scores with Bootstrapping
Chung, Neo Christopher, Laletin, Maxim
Vision transformers (ViT) rely on attention mechanism to weigh input features, and therefore attention scores have naturally been considered as explanations for its decision-making process. However, attention scores are almost always non-zero, resulting in noisy and diffused attention maps and limiting interpretability. Can we quantify uncertainty measures of attention scores and obtain regularized attention scores? To this end, we consider attention scores of ViT in a statistical framework where independent noise would lead to insignificant yet non-zero scores. Leveraging statistical learning techniques, we introduce the bootstrapping for attention scores which generates a baseline distribution of attention scores by resampling input features. Such a bootstrap distribution is then used to estimate significances and posterior probabilities of attention scores. In natural and medical images, the proposed \emph{Attention Regularization} approach demonstrates a straightforward removal of spurious attention arising from noise, drastically improving shrinkage and sparsity. Quantitative evaluations are conducted using both simulation and real-world datasets. Our study highlights bootstrapping as a practical regularization tool when using attention scores as explanations for ViT. Code available: https://github.com/ncchung/AttentionRegularization
- Europe > Poland > Masovia Province > Warsaw (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa > Middle East > Morocco > Tanger-Tetouan-Al Hoceima Region > Tangier (0.04)
- Health & Medicine > Diagnostic Medicine > Imaging (0.48)
- Health & Medicine > Therapeutic Area (0.46)
- Asia > South Korea > Seoul > Seoul (0.05)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Data Science (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.41)
Well-tunedSimpleNetsExcelon TabularDatasets
Weempirically assess theimpact oftheseregularization cocktailsforMLPs ina large-scale empirical study comprising 40 tabular datasets and demonstrate that (i) well-regularized plain MLPs significantly outperform recent state-of-the-art specialized neural network architectures, and (ii) they even outperform strong traditionalMLmethods,suchasXGBoost.
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- North America > United States (0.04)
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > Michigan (0.04)
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > China > Jiangsu Province (0.04)
- Asia > China > Chongqing Province > Chongqing (0.04)
R-Drop: RegularizedDropoutforNeuralNetworks
In this paper,we introduce asimple yet more effectivealternativeto regularize the training inconsistencyinduced bydropout, named asR-Drop. Concretely,ineachmini-batch training, eachdata sample goes through the forward pass twice, and each pass isprocessed by adifferent sub model by randomly dropping out some hidden units.